INTERNATIONAL JOURNAL OF APPLIED LANGUAGE AND CULTURAL STUDIE (IJALSC) 
Vol. 2, No. 2, 2019. 


DESIGNING THE SLOVAK MATRIX SENTENCE TEST 


Renata Panocova, Faculty of Arts, Pavol Jozef Safarik University in Ko8ice, Slovakia, 
E-mail: renata.panocova@upjs.sk 

Renata Gregova, Faculty of Arts, Pavol Jozef Safarik University in Ko8ice, Slovakia, 
E-mail: renata.gregova@upjs.sk 


Abstract. This paper presents partial results of a larger-scale project of designing the matrix sentence test 
for Slovak. The main aim is presentation and detailed discussion of linguistic aspects of Slovak matrix sentence 
test. First, morphosyntactic criteria are outlined. These are followed by description of problematic issues and the 
solutions proposed. Second, phonological criteria are given and discussed. In the next step, the matrix test will be 
optimized and evaluated in order to measure speech intelligibility function and to establish the correct reference 


data for listeners with normal hearing. 
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INTRODUCTION 


Speech audiometry is a standard method 
used in the diagnostics of hearing impairment. 
A number of standardized speech tests for in- 
dividual languages have been designed over 
the past decades in order to determine the de- 
gree and nature of the impairment. Two main 
types of speech tests can be distinguished. One 
is based on meaningful, everyday sentences 
with a variable grammatical structure (e.g. 
Plomp & Mimpen, 1979; Nilsson et al, 1994; 
Kollmeier & Wesselkamp, 1997; Versfeld et 
al, 2000; Wong & Soli, 2005; van Wieringen 
& Wouters, 2008; Luts et al, 2008; Ozimek et 
al, 2009; Nielsen & Dau, 2011). The advan- 
tage of this type of test is that it accurately 
reflects everyday language in common com- 
municative situations. On the other hand, its 
main disadvantage is that the sentences can be 
easily memorized. The other type of speech 
tests is a so-called matrix test. A matrix test is 
characterized by a fixed order of items, proper 
name, verb, numeral, adjective, noun (object), 
which produces grammatical sentences with 
an unpredictable meaning (Hagerman, 1982; 
Wagener, 1999a, b, c; Ozimek et al, 2010; Ho- 
chmuth et al, 2012; Jansen et al, 2012; Dietz 
et al, 2014; Houben et al, 2014; Kollmeier et 
al, 2015). 
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The very first matrix test was developed 
by Hagerman (1982) for Swedish. His ma- 
trix consisted of 10 first names, 10 verbs, 10 
numerals, 10 adjectives and 10 nouns. This 
represents a corpus of 50 distinct words. Test 
sentences are generated randomly from the 
matrix in a way that all sentences have an 
identical syntactic structure, e.g. Kathy sees 
nine small chairs. Alan gives eight dark toys, 
that is, SVO. The matrix serves as a basis for 
the set of 10 sentences in which a word can 
occur only once in a set. This results in 105 = 
100 000 sentences or in other words, 10 000 
sets with 10 sentences each. 

In the past years, the matrix test devel- 
oped in Oldenburg, Germany (Oldenburger 
Satztest, OLSA) has become popular and 
widely used. Gradually, matrix tests started to 
be designed for individual languages. At the 
moment, there are nine matrix tests available 
as a medical device for German, American 
English, Spanish, Finnish, Italian, Polish, Rus- 
sian, French and Turkish. Eleven matrix tests 
are Officially under development including 
British English, Swedish, Danish, Norwegian, 
Hebrew, Arabic, Persian, Dutch, Japanese, 
Chinese, Hindi (Hértech, Oldenburg, 2015). It 
is expected that the number of languages with 
matrix tests readily available as a medical de- 
vice will continue increasing. 

There are at least three strong points of 
matrix tests. Undoubtedly, the main advantage 
is that the meaning of the sentences cannot be 
predicted. Given the total number of possible 
generated sentences (see above) and their low 
meaning predictability, it is unlikely that pa- 
tients will memorize them. An advantage fol- 
lowing from this is that the matrix test can be 
conducted repeatedly with the same patient 
without negative influence on the test results. 
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Another advantage is that a patient can be 
tested in any language for which a standard- 
ized matrix test is developed. In addition, it is 
not necessary that an audiometrist speaks the 
language the patient is tested. Matrix Tests can 
be carried out in a so-called closed test format. 
This means that the patient sees the matrix of 
possible words on a computer screen and can 
select the words that he or she just heard. Last 
but not least, individual language versions of 
matrix tests can be easily compared thanks to 
their standardized structure. 

For only two Slavic languages, Russian 
and Polish, a matrix test has been developed 


so far. The aim of this paper is to outline the 
process of designing a matrix speech test for 
Slovak. 


MATRIX TEST SPEECH 
MATERIAL 


Slovak belongs to the West Slavic group 
of Slavic languages together with Czech and 
Polish. It is an inflectional language with elab- 
orated declension and conjugation systems. 
The morphological type of Slovak played im- 
portant role in the selection of linguistic mate- 
rial for the matrix sentence test given in Table 


Table 1. Fifty-word base matrix of the Slovak matrix test 


Name Verb Numeral Adjective Noun 

Jano chce “wants ‘ vera dal8ich ‘other* domov ‘houses* 
‘many/much ‘ 

Peter éaka ‘waits ‘ tristo “three novych ‘new* lavic ‘benches’ 
hundred* 

Martin dava ‘gives ‘ sto ‘hundred® celych ‘whole mostov 


‘bridges* 


Jozo vidi ‘sees ‘ Stvoro ‘four’ velkych ‘big* lamp ‘Jamps* 

Pavol hfada ‘looks dvesto ‘two malych ‘small* vedier “buckets” 
for‘ hundred‘ 

Maria drzi ‘holds ‘ ar ‘a few starych ‘old lyzic “spoons* 


Viera pozna ‘kmows‘ = sedem “seven* dobrych “good okien 
“windows* 
Anna ma ‘has ‘ osem “eight* zlych “bad budov 


“buildings* 


Jana berie “takes ‘ 


malo ‘little/few* 


nozov ‘knives 


eknych ‘nice* 


Eva nechce ‘doesn’t mnoho 


want‘ 


Table 1 presents the fifty-word base 
matrix of the Slovak matrix sentence test we 
developed. It includes ten words of five syn- 
tactic categories: personal names, verbs, nu- 
merals, adjectives and nouns. The main factor 
influencing the selection of lexical items was 
the underlying assumption that each sentence 
must be syntactically correct when randomly 
generated at the test. The words were selected 
on the basis of the frequency lists in the Slo- 
vak National Corpus (SNC). The frequency 
lists are available for individual word class- 
es. This is one of advantages of SNC which 
makes SNC a well-designed, balanced and 
user-friendly corpus. For the matrix design 
the top 1000 most frequent lemma lists were 
used to ensure that the words are general and 
commonly used in basic communicative situ- 
ations. Another criterion was that all selected 
words were semantically neutral. Stylistically 
and emotionally marked words were excluded. 


izieb ‘rooms® 


inych ‘different* 


‘many/much ‘ 


Personal Names 


The first five names in Table | are male 
names and the remaining ones are female per- 
sonal names. In their selection, the main crite- 
ria applied include absolute frequency value 
and length. Only names with maximum two 
syllables' were selected. Male names appear 
first in Table 1 and the reason is that their ab- 
solute frequencies were higher than female 
personal names, for instance, the value for 
Jano is 458 882, for Peter 433 677, for Martin 
342 806, for Jozef 317 557, and for Pavol it is 
240 186 whereas the frequency score for the 
female name Maria is 180 829 and for Eva the 
value is even lower, 75 727. Still, absolute fre- 
quencies were significantly higher than in the 
Russian matrix where the absolute frequency 


1 Although the female personal name Maria 
with its three syllables (Ma-ri-a) is an excep- 
tion to this criterion, it has been included into 
the matrix due to its high frequency of occur- 
rence (see below in the running text). 
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threshold was 2034 (Warzybok et al., 2015: 2). 
Warzybok et al.‘s decision was based on the 
recent frequency dictionary of modern Rus- 
sian (Sharoff, 2002). Given that Russian has 
a much larger word stock, defining frequent 
words on the basis of the threshold value of 
2034 seems surprising. 


Verbs 


In Slovak, the system of conjugation is 
complex and often determined by gender of 
the subject of the sentence. This is similar in 
other Slavic languages. Past tense forms were 
excluded, because they are marked for gen- 
der. Therefore, present tense verb forms were 
used instead. Only disyllabic present tense 
verb forms were selected from the top 1000 
most frequent verbs in SNC. The cut-off point 
was an absolute frequency higher than 25 000. 
The most challenging task was to select verbs 
with the neutral and sufficiently general mean- 
ing to ensure meaningful combinability with 
the numerals, adjectives and nouns. It is also 
interesting to note that verbs are listed in the 
second column although they were selected 
after nouns, adjectives and numerals. 


Numerals 


A basic criterion for the selection of 
numerals was that they had to be higher than 
5. The reason is that only 5 and higher com- 
bine with nouns and adjectives in the genitive 
plural, for example 5 domov ‘5 housesGen 
Pl’. The numerals 2, 3, and 4 combine with 
nominative plural, for instance, 3 domy ‘3 
housesNom PI’. Indefinite numerals are also 
included. In fact, these had to be taken into 
account to make a list of ten numerals, which 
would meet the condition of being maximum 
disyllabic while simultaneously being higher 
than 5. The absolute frequency threshold for 
numerals was 8000. 


Adjectives 


The selection of adjectives was a chal- 
lenging task. All adjectival word forms are 
disyllabic in genitive plural, semantically neu- 
tral yet possible in combination with nouns, 
resulting in grammatically correct, although 
not entirely predictable phrases. In addition, 
the set of adjectival forms had to be phono- 
logically balanced. The absolute frequency 
threshold was 15 000. It is interesting to note 
that the above mentioned Russian matrix in- 
cludes two colour adjectives, krasnyj ‘red’ 
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and seryj ‘gray’ (Warzybok, 2015: 2). In Slo- 
vak, the equivalent for ‘red’ is erveny, which 
consists of three syllables and therefore falls 
outside the criteria. The Polish matrix lists 
three colour adjectives, bialy ‘white’, zolty 
‘yellow’, and czarny ‘black’ (Ozimek et al., 
2010), all similar to Slovak in their genitive 
plural. However, frequent disyllabic colour 
adjectives were excluded in Slovak either due 
to their difficult consonant clusters, for exam- 
ple z/ty ‘yellow’, or because they resulted in 
meaningless combinations with nouns. 


Nouns 


Nouns taking the object position in gen- 
erated sentences were selected prior to ad- 
jectives. Concrete and countable nouns were 
considered as appropriate candidates. Only 
disyllabic forms in the genitive plural were 
included. The nouns are of all three genders, 
three masculine nouns (domov ‘houseGen 
Pl’, nozov ‘knifeGen Pl’, mostov ‘bridgeGen 
Pl’), five feminine (/avic ‘benchGen Pl’, /amp 
‘lampGen PI’, lyzic ‘spoonGen Pl’, budov 
‘buildingGen Pl’, izieb ‘roomGen PI’) and 
two neuter nouns (vedier ‘bucketGen Pl’, ok- 
ien ‘windowGen PI’). The frequency thresh- 
old was 10 000. 


Phonological criteria 


The selection of words for matrix test 
has to follow also two phonological criteria. 
First, the pronunciation of words should be 
identical in all possible combinations, that is, 
it is necessary to solve the co-articulation and/ 
or assimilation processes between the neigh- 
bouring words in a sentence. Second, the dis- 
tribution of phonemes in words creating the 
matrix test should reflect the distribution of 
phonemes in a given language. 

In the Slovak language, the regressive 
voice assimilation at word boundaries plays 
very important role in pronunciation. Basi- 
cally, voice obstruents when followed by a 
voiceless sound lose their voice character and 
become voiceless (for example, pod stromom 
/pot stromom/) and voiceless obstruents when 
followed by a voiced sound gain the voice 
character and become voiced (for example, 
vlak meska /vlag meska/). Mistakes in the as- 
similation of voice are noticeable and are usu- 
ally evaluated as errors of orthoepy (for de- 
tails, see Kral’, 2005: 53 — 62). 

So as to provide the sound form of the 
matrix test sentences as close to the natural 
pronunciation as possible, we had to keep in 
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mind that when randomly generating the test 
sentences, the voiced/voiceless character of 
the word-final segments may change depend- 
ing on the voice/voiceless nature of the fol- 
lowing sound. Consequently, in our speech 
material, in the sequence personal name — 
verb no voice assimilation will take place ir- 
respective of the combination of a name and 
a verb since all personal names chosen for 
the matrix end either in a vowel or in a so- 
norant (see Table 1). The voiced character of 
those sounds is not affected by the following 
sound. The combination verb — numeral does 
not cause any difficulties too since all present 
tense verb forms (see above) end in a vow- 
el. Taking into account various semantic and 
syntactic restrictions accompanying the selec- 
tion of adjectives, we had to find out numerals 
that — except for the combinability possibili- 
ties given by the inflectional character of the 
Slovak language (specified above) — would 
end either in a vowel or in a sonorant. The 
voice character of the word-initial segment of 
the following adjective could then be of any 
value. All adjectives in the matrix are in the 
plural accusative form that is characterized 
by the suffix -y(i)ch. The phoneme /x/* as 


2 The symbols of the IPA are used for noting 
down phonemes (see, e.g., Roach 2000). 


a voiceless obstruent changes into its voiced 
counterpart /h/ when followed by a voiced ele- 
ment. In the preliminary version of our matrix 
when only the semantic and frequency crite- 
ria were considered all but three nouns started 
in a voiced element. To preserve the uniform 
pronunciation of the adjective final consonant 
/x/ we had to replace those three nouns with 
nouns starting in a voiced sound. Then the 
pronunciation of the adjective in the combina- 
tion with any noun from our matrix is with /h/. 


Phoneme distribution? 


In the Slovak language, there are five 
short vowel phonemes (i, e, a, 0, u), five long 
vowel phonemes (i:, e:, a:, 0:, u:), four diph- 
thongs (ia, ie, lu, uo) and 27 consonant pho- 
nemes (p, b, m, f, v, t, d, n, 1, r, s, z, ts, dz, c, J, 
yi; Ay J, 3, tf, ds, j, k, g, x, h). The graphic rep- 
resentation of the frequency of the occurrence 
of the Slovak vowel and consonant phonemes 
can be found in Figure 1. 


3 See note 2. 


Figure 1. The frequency distribution of the Slovak phonemes 


The frequency distribution of phonemes in the matrix test designed for Slovak (Table 1) 


is captured in Figure 2. 


Figure 2. The frequency distribution of the Slovak phonemes in the matrix sentences 
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Figure 3 shows the comparison of the 
so-called reference distribution of the Slo- 
vak phonemes (Fig. 1) with the distribution 
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of phonemes in words we have chosen for the 
Slovak matrix test. 


Figure 3. The reference distribution of the Slovak phonemes and the distribution of the 


Slovak phonemes in matrix sentences 


As its follows from Figure 3, the occur- 
rence of phonemes in the matrix test designed 
for Slovak corresponds with the general (ref- 
erence) frequency distribution of Slovak pho- 
nemes. The Figure indicates higher occurrence 
of the phonemes /1:/, /a:/ and /x/ in the matrix 
sentences. This discrepancy can be easily ex- 
plained by the structure of the sentences: each 
verb is in the 3rd person singular present tense 
where the suffix -a /a:/ dominates. Each adjec- 
tive is in the plural accusative form ending in, 
as already mentioned, the suffix -y(i)ch /1:x/’. 


CONCLUSION AND IMPLICATIONS 
FOR FURTHER RESEARCH 


The paper outlines the process of creat- 
ing the matrix test — a diagnostic method of 
hearing impairment — for the Slovak language. 
The matrix tests for 20 languages have been 
developed until the present-day or are still 
being developing. Slovak is a language with 
very rich inflectional morphology and thus the 
preparation of a matrix consisting of 10 proper 
names, 10 verbs, 10 numerals, 10 adjectives 
and 10 nouns that, when selected randomly, 
would provide a meaningful and grammati- 
cally correct sentences with the structure SVO 
was the real challenge. 

First, the selection of the proper lin- 
guistic material was based on the frequency 
of the occurrence in everyday communicative 
situations and semantic neutrality of the se- 
lected words. 

Then, the matrix consisting of 50 words 
falling into five syntactic categories was re- 
evaluated so as to fulfil also phonological 
criteria. The result is the matrix (Table 1) en- 
abling to produce grammatically correct and 


1 See Warzybok et al. 2015 for similar results 
in Russian. 


semantically unpredictable sentences whose 
sound form respects the natural Slovak pro- 
nunciation and the frequency distribution of 
the phonemes included in the selected words 
corresponds to the general distribution of Slo- 
vak phonemes. 

Our matrix test for Slovak thus follows 
the recommendations of the International Col- 
legium of rehabilitative Audiology as speci- 
fied in Akeroyd et al (2015). The recommen- 
dations supplement the norm ISO 8253-3: 
2012 Acoustic-audiometric test methods, part 
3 Speech audiometry. In the next step, optimi- 
zation and evaluation measurements will have 
to be carried out to measure speech intelligi- 
bility function and to establish the correct ref- 
erence data, that is, the data for listeners with 
normal hearing. Then, we hope, the matrix 
test can be used in audiometry practice. 
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